Speech/laughter classification in meeting audio
نویسندگان
چکیده
In this paper, harmonicity information is incorporated into acoustic features to detect laughter segments and speech segments. We implement our system using HMM (Hidden Markov Models) classifier trained on Pitch and Harmonic Frequency Scale based subband filters (PHFS). Harmonicity of the signal can be determined by variation of the pitch and harmonics. The cascaded subband filters are used to spread in pitch and harmonicity frequency scale to describe the harmonicity information. The pitch bandwidth of the first layer spans from 80 Hz to 300 Hz and the entire band spans 80 Hz ~ 8 kHz. The experiments are conducted on ICSI meeting corpus (BMR and Bed). We achieve an average error rate of 0.84% for ‘BMR’ meeting and 3.64% for ‘BED’ meeting in segment level speech and laughter detection. The results show that the proposed Pitch and Harmonic Frequency Scale (PHFS) based feature is robust and effective.
منابع مشابه
Laughter Detection in Noisy Settings
Spontaneous human speech contains a lot of sounds that are not proper speech, yet carry meaning, laughter being a good example. Recognizing such sounds from speech-sounds could improve speech recognition systems as well as widen the communicative range of automatic dialogue systems. Our goal is to develop methods for automatic classification non-speech vocal sounds. As laughter varies widely be...
متن کاملFusion for Audio-Visual Laughter Detection
Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. ...
متن کاملDecision-Level Fusion for Audio-Visual Laughter Detection
Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audiovisual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This r...
متن کاملCombining acoustic and visual features to detect laughter in adults' speech
Laughter can not only convey the affective state of the speaker but also be perceived differently based on the context in which it is used. In this paper, we focus on detecting laughter in adults’ speech using the MAHNOB laughter database. The paper explores the use of novel long-term acoustic features to capture the periodic nature of laughter and the use of computer vision-based smile feature...
متن کاملLaughter and filler detection in naturalistic audio
Laughter and fillers are common phenomenon in speech, and play an important role in communication. In this study, we present Deep Neural Network (DNN) and Convolutional Neural Network (CNN) based systems to classify non-verbal cues (laughter and fillers) from verbal speech in naturalistic audio. We propose improvements over a deep learning system proposed in [1]. Particularly, we propose a simp...
متن کامل